Chinese Chunking with Another Type of Spec

نویسندگان

  • Hongqiao Li
  • Changning Huang
  • Jianfeng Gao
  • Xiaozhong Fan
چکیده

Spec is a critical issue for automatic chunking. This paper proposes a solution of Chinese chunking with another type of spec, which is not derived from a complete syntactic tree but only based on the un-bracketed, POS tagged corpus. With this spec, a chunked data is built and HMM is used to build the chunker. TBLbased error correction is used to further improve chunking performance. The average chunk length is about 1.38 tokens, F measure of chunking achieves 91.13%, labeling accuracy alone achieves 99.80% and the ratio of crossing brackets is 2.87%. We also find that the hardest point of Chinese chunking is to identify the chunking boundary inside noun-noun sequences1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of the Chinese Genotype of Infectious Bronchitis Virus (QX-type) in Iran

Case Report: Recently a 20-day-old layer flock with mortality has been submitted to the PCR Lab. Infectious Bronchitis Virus (IBV) has been detected in the clinical samples. Results: A phylogenetic tree based on a partial S1 gene sequence showed Iranian IBV variant located in LX4-type cluster. This cluster include all QXIBV-type detected in China and European countries. The highest sequence hom...

متن کامل

An Empirical Study of Chinese Chunking

In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the state-of-the-art machine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1) We propose an approach to resolve the special problems of Chinese chunking. This approa...

متن کامل

Chinese Chunking Based on Maximum Entropy Markov Models

This paper presents a new Chinese chunking method based on maximum entropy Markov models. We firstly present two types of Chinese chunking specifications and data sets, based on which the chunking models are applied. Then we describe the hidden Markov chunking model and maximum entropy chunking model. Based on our analysis of the two models, we propose a maximum entropy Markov chunking model th...

متن کامل

Chinese Chunking based on Conditional Random Fields

In this paper, we proposed an approach for Chinese chunking based on the Conditional Random Fields model (CRFs). For sequence labeling, CRFs has advantages over generative models. Furthermore, Chinese chunking is a difficult sequence labeling task. This paper describes how to use CRFs for Chinese chunking via capturing the arbitrary and overlapping features. We defined different types of featur...

متن کامل

Chinese Chunking and Consistency Checking Using Rule-Based Method

This paper presents a rule-based chunking approach. Rule-based method does well in analyzing the structure of natural language. In order to avoid the confliction of the rules, we extract a small scale chunking rule set for chunking first. Then we define more rules to check and correct the inconsistency phenomena. We also adopt man-machine interaction method to solve some special language phenom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004